Reinforcement Learning in Large Multi-agent Systems
نویسندگان
چکیده
Enabling reinforcement learning to be effective in large-scale multi-agent Markov Decisions Problems is a challenging task. To address this problem we propose a multi-agent variant of Q-learning: “Q Updates with Immediate Counterfactual Rewards-learning” (QUICR-learning). Given a global reward function over all agents that the large-scale system is trying to maximize, QUICR-learning breaks down the global reward into many agent-specific rewards that have the following two properties: 1) agents maximizing their agentspecific rewards tend to maximize the global reward, 2) an agent’s action has a large influence on its agent-specific reward, allowing it to learn quickly. Each agent then uses standard Q-learning type updates to form a policy to maximize the agent-specific rewards. Results on multi-agent grid-world problems over two topologies, show that QUICRlearning can be effective with hundreds of agents and can achieve up to 300% improvements in performance over both conventional and local Q-learning in the largest tested systems.
منابع مشابه
Voltage Coordination of FACTS Devices in Power Systems Using RL-Based Multi-Agent Systems
This paper describes how multi-agent system technology can be used as the underpinning platform for voltage control in power systems. In this study, some FACTS (flexible AC transmission systems) devices are properly designed to coordinate their decisions and actions in order to provide a coordinated secondary voltage control mechanism based on multi-agent theory. Each device here is modeled as ...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملOptimal adaptive leader-follower consensus of linear multi-agent systems: Known and unknown dynamics
In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...
متن کاملA Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem
Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...
متن کاملMulti-agent learning in mobilized ad-hoc networks
In large, distributed systems such as mobilized ad-hoc networks, centralized learning of routing or movement policies may be impractical. We need to employ multi-agent learning algorithms that can learn independently, without the need for extensive coordination. Using only a simple coordination signals such as a global reward value, we show that reinforcement learning methods can be used to con...
متن کامل